The data used in the report was obtained from VicRoads, and contains comprehensive information on approximately 200,000 accidents that occurred in the state from 2006 to 2020.
Purpose of this analysis report:
To explore the trends and patterns in Victoria’s road safety data
Identifying which factors are associated with a higher risk of accident or death
And seeking to explain those relationships
Research questions:
Part 1:
Part 2:
Part 3:
TEAM MEMBERS
| Name | Email Address | Student Id |
|---|---|---|
| Kaihao Chen | kche154@student.monash.edu | 27439992 |
| Arek Chouzadjian | acho0007@student.monash.edu | 28644182 |
| Ibrahim Al-Hindi | imalh2@student.monash.edu | 24112488 |
| term | estimate | std.error | statistic | p.value |
|---|---|---|---|---|
| (Intercept) | 0.4420226 | 0.0627854 | 7.040211 | 0e+00 |
| vehicle_year_manuf | -0.0002169 | 0.0000314 | -6.918470 | 1e-07 |
| r.squared | adj.r.squared | sigma | statistic | p.value | df | logLik | AIC | BIC | deviance | df.residual | nobs |
|---|---|---|---|---|---|---|---|---|---|---|---|
| 0.5846833 | 0.5724681 | 0.0019542 | 47.86523 | 1e-07 | 1 | 174.5064 | -343.0127 | -338.2622 | 0.0001298 | 34 | 36 |
| Road | Accidents |
|---|---|
| PRINCES HIGHWAY | 3581 |
| HIGH STREET | 3096 |
| NEPEAN HIGHWAY | 2376 |
| SPRINGVALE ROAD | 1663 |
| SOUTH GIPPSLAND HIGHWAY | 1538 |
| SYDNEY ROAD | 1538 |
| MONASH FREEWAY | 1533 |
| MAROONDAH HIGHWAY | 1335 |
| Road | Accidents | Deaths | Deaths_per_accident |
|---|---|---|---|
| GLENELG HIGHWAY | 231 | 28 | 0.1212121 |
| GOULBURN VALLEY HIGHWAY | 344 | 39 | 0.1133721 |
| WIMMERA HIGHWAY | 156 | 17 | 0.1089744 |
| MURRAY VALLEY HIGHWAY | 727 | 76 | 0.1045392 |
| STRZELECKI HIGHWAY | 106 | 11 | 0.1037736 |
| HAMILTON HIGHWAY | 223 | 21 | 0.0941704 |
| MELBA HIGHWAY | 211 | 18 | 0.0853081 |
| NORTHERN HIGHWAY | 300 | 25 | 0.0833333 |
Findings from the data analysis:
Accidents increase gradually throughout the working week, and whilst accidents are most common in evening peak hour, it is the least deadly time of day in which to have an accident
Higher speeds dramatically increase the risk of dying in an accident, and more recent models of car are far less prone to fatal accidents than older varieties. This is due to improved safety standards in vehicle manufacturing in recent decades
Young males are those who are most likely to be involved in an accident, and motorcyclists and pedestrians are most likely to die in an accident. Additionally, regional highways tend to be the deadliest roads in Victoria
C. Sievert. Interactive Web-Based Data Visualization with R, plotly, and shiny. Chapman and Hall/CRC Florida, 2020.
D. Kahle and H. Wickham. ggmap: Spatial Visualization with ggplot2. The R Journal, 5(1), 144-161. URL http://journal.r-project.org/archive/2013-1/kahle-wickham.pdf
David Robinson, Alex Hayes and Simon Couch (2021). broom: Convert Statistical Objects into Tidy Tibbles. R package version 0.7.5. https://CRAN.R-project.org/package=broom
Garrett Grolemund, Hadley Wickham (2011). Dates and Times Made Easy with lubridate. Journal of Statistical Software, 40(3), 1-25. URL https://www.jstatsoft.org/v40/i03/.
Jeffrey B. Arnold (2021). ggthemes: Extra Themes, Scales and Geoms for ‘ggplot2’. R package version 4.2.4. https://CRAN.R-project.org/package=ggthemes
Katherine Goode and Kathleen Rey (2019). ggResidpanel: Panels and Interactive Versions of Diagnostic Plots using ‘ggplot2’. R package version 0.3.0. https://CRAN.R-project.org/package=ggResidpanel
Richard Iannone, JJ Allaire and Barbara Borges (2020). flexdashboard: R Markdown Format for Flexible Dashboards. R package version 0.5.2. https://CRAN.R-project.org/package=flexdashboard
Sam Firke (2021). janitor: Simple Tools for Examining and Cleaning Dirty Data. R package version 2.1.0. https://CRAN.R-project.org/package=janitor
Wickham et al., (2019). Welcome to the tidyverse. Journal of Open Source Software, 4(43), 1686, https://doi.org/10.21105/joss.01686
Yihui Xie (2021). knitr: A General-Purpose Package for Dynamic Report Generation in R. R package version 1.32.
---
title: "Motor Vehicle Accidents in Victoria"
output:
flexdashboard::flex_dashboard:
vertical_layout: scroll
orientation: rows
source_code: embed
---
```{r setup, include=FALSE}
knitr::opts_chunk$set(echo = FALSE,
eval = TRUE,
message = FALSE,
warning = FALSE)
```
```{r libraries}
library(flexdashboard)
library(tidyverse)
library(lubridate)
library(janitor)
library(plotly)
library(ggResidpanel)
library(broom)
library(knitr)
library(kableExtra)
library(ggmap)
library(ggthemes)
```
```{r load-data}
accidents <- read_csv("data/ACCIDENT.csv") %>%
clean_names()
locations <- read_csv("data/ACCIDENT_LOCATION.csv") %>%
clean_names()
nodes <- read_csv("data/NODE.csv") %>%
clean_names()
persons <- read_csv("data/PERSON.csv") %>%
clean_names()
vehicles <- read_csv("data/VEHICLE.csv") %>%
clean_names()
```
```{r create-year-hour-day}
accidents <- accidents %>%
mutate(accidentdate = dmy(accidentdate),
Year = year(accidentdate),
Hour = hour(accidenttime),
Weekday = wday(accidentdate,
label = TRUE,
abbr = FALSE))
```
Introduction {data-icon="fa-address-book"}
=====================================
Row {data-width = 600}
-----
### **Introduction**
The data used in the report was obtained from VicRoads, and contains comprehensive information on approximately 200,000 accidents that occurred in the state from 2006 to 2020.
**Purpose of this analysis report**:
* To explore the trends and patterns in Victoria’s road safety data
* Identifying which factors are associated with a higher risk of accident or death
* And seeking to explain those relationships
**Research questions**:
* Part 1:
- The impact of temporal factors, such as year, weekday and hour, on the number of accidents.
* Part 2:
- The relationship of speed and the age of vehicles with the death rate from accidents.
* Part 3:
- Explores the effect of age and gender on accident numbers, as well as which roads in Victoria are most accident-prone and most deadly.
***
**TEAM MEMBERS**
|Name |Email Address |Student Id|
|:---------------:|:-------------------------:|:--------:|
|Kaihao Chen |kche154@student.monash.edu | 27439992 |
|Arek Chouzadjian |acho0007@student.monash.edu| 28644182 |
|Ibrahim Al-Hindi |imalh2@student.monash.edu | 24112488 |
***
Part 1 {data-icon="fa-battery-quarter"}
=====================================
Row {data-width=600}
--------------------------------
### Accidents per year
```{r chen1}
accidents_per_year <- accidents %>%
count(Year) %>%
ggplot(aes(x = Year,
y = n)) +
geom_line() +
xlab("Year(2006 ~ 2020)") +
ylab("Number of Car Accidents") +
geom_point()
ggplotly( accidents_per_year)
```
### Accidents by weekday
```{r chen2}
accidents %>%
count(Weekday,
name = "Accidents") %>%
ggplot(aes(x = Weekday,
y = Accidents)) +
geom_bar(stat = "identity",
fill = "#999999") +
ylab("Number of Car Accidents") +
geom_text(aes(label = Accidents),
vjust = -1,
color = "black",
size = 3)
```
Row {data-width=600}
--------------------------------
### Accidents by hour
```{r chen3}
accidents_by_hour <- accidents %>%
count(Hour,
name = "Accidents")
p3 <- accidents_by_hour %>%
ggplot(aes(x = Hour,
y = Accidents)) +
geom_line() +
xlab("Time") +
ylab("Number of Car Accidents") +
geom_point()
ggplotly(p3)
```
### Deaths by hour
```{r deaths-by-hour}
deaths_by_hour <- accidents %>%
group_by(Hour) %>%
tally(no_persons_killed,
name = "Deaths")
```
```{r deaths-per-accident-by-hour}
deaths_per_accident_by_hour <- accidents_by_hour %>%
left_join(deaths_by_hour) %>%
mutate(Deaths_per_accident = round(Deaths/Accidents, digit = 4))
```
```{r chen4}
p4 <- deaths_per_accident_by_hour %>%
ggplot(aes(x = Hour,
y = Deaths_per_accident)) +
geom_line() +
xlab("Time") +
ylab("Death Rate of Car Accidents") +
geom_point()
ggplotly(p4)
```
Part 2 {data-icon="fa-battery-half"}
=====================================
Row {.tabset data-height=500}
------------
### **Deaths by speed zone**
```{r accidents-by-speed-zone}
accidents_by_speed_zone <- accidents %>%
count(speed_zone,
name = "Accidents")
```
```{r deaths-by-speed-zone}
deaths_by_speed_zone <- accidents %>%
group_by(speed_zone) %>%
tally(no_persons_killed,
name = "Deaths")
```
```{r deaths-per-accident-by-speed-zone}
deaths_by_accident <- accidents_by_speed_zone %>%
left_join(deaths_by_speed_zone) %>%
mutate(Deaths_by_accident = Deaths/Accidents)
```
```{r deaths-per-accident-plot}
deaths_by_accident %>%
mutate(speed_zone = as.numeric(speed_zone)) %>%
filter(speed_zone %in% seq(30, 110, 10)) %>%
ggplot(aes(y = Deaths_by_accident,
x = speed_zone)) +
geom_line() +
labs(x = "Speed Zone",
y = "Deaths by Accident")
ggplotly()
```
Row {.tabset data-height=500}
------------
### **Death rate by year of vehicle manufacture**
```{r join-person-and-vehicle}
person_vehicle <- persons %>%
left_join(vehicles)
```
```{r total-people-involved-in-accidents-per-manufature-year}
person_vehicle_total <- person_vehicle %>%
group_by(vehicle_year_manuf) %>%
tally(name = "Persons",
sort = TRUE)
person_vehicle_deaths <- person_vehicle %>%
filter(inj_level_desc == "Fatality") %>%
group_by(vehicle_year_manuf) %>%
tally(name = "Fatalities",
sort = TRUE)
```
```{r join-total-and-deaths}
death_rate_by_year_manuf <- person_vehicle_total %>%
left_join(person_vehicle_deaths) %>%
mutate(death_rate = Fatalities/Persons) %>%
arrange(desc(vehicle_year_manuf))
```
```{r plot-death-rate-by-year-manuf}
manuf_year_death_rate <- death_rate_by_year_manuf %>%
filter(vehicle_year_manuf >= 1985 & vehicle_year_manuf < 3001)
p1 <- manuf_year_death_rate %>%
ggplot(aes(x = vehicle_year_manuf,
y = death_rate)) +
geom_point() +
geom_smooth(method = "lm", se = FALSE) +
labs(x = "Year Manufactured",
y = "Death Rate")
ggplotly(p1)
```
### **Regression model**
```{r regression-model}
manuf_year_death_rate_lm <- lm(death_rate ~ vehicle_year_manuf, data = manuf_year_death_rate)
resid_panel(manuf_year_death_rate_lm, plot = "all")
```
### **Goodness of fit**
```{r}
tidy(manuf_year_death_rate_lm) %>%
kable() %>%
kable_styling(bootstrap_options = "striped")
glance(manuf_year_death_rate_lm) %>%
kable() %>%
kable_styling(bootstrap_options = "striped")
```
Column {.sidebar data-width=350}
----
***
> **Findings**
1. Accidents become increasingly serious the faster the speed at which they occur. The risk of dying in an accident is nearly 13 times higher in a 110km/h zone (0.064 deaths/accident) than in a 40km/h zone (0.005 deaths/accident).
2. A person's risk of dying in an accident is positively correlated with the age of their vehicle; the newer the make, the less likely it is that a person will be killed in an accident. For every year older a vehicle is, the risk of dying if you have an accident in it increases by 0.0002 deaths per accident. The reason for this is improved safety standards for vehicles.
***
Part 3 {data-icon="fa-battery-three-quarters"}
=====================================
Row {.tabset data-height=800}
---------
### **Accidents Map**
```{r accidents-map}
location_box <- c(min(nodes$long),
min(nodes$lat),
max(nodes$long),
max(nodes$lat))
location_map <- get_map(location = location_box,
source = 'osm')
ggmap(location_map) +
geom_point(data = nodes,
aes(x = long,
y = lat),
colour="#0072B2",
alpha=0.5,
size = 0.05) +
theme_map()
```
### **Roads with Most Accidents and Highest Death Rates**
```{r join-road-names}
roads_accidents <- accidents %>%
left_join(locations) %>%
mutate(Road = paste(road_name,
road_type))
```
```{r accidents-by-road}
accidents_per_road <- roads_accidents %>%
count(Road,
name = "Accidents"
,sort = T)
deaths_per_road <- roads_accidents %>%
group_by(Road) %>%
tally(no_persons_killed,
name = "Deaths",
sort = TRUE)
accidents_per_road %>%
head(8) %>%
kable(caption = "Accidents by road") %>%
kable_styling(bootstrap_options = "striped")
```
```{r deadliest-road}
deadliest_road <- accidents_per_road %>%
left_join(deaths_per_road) %>%
filter(Accidents >= 100) %>%
mutate(Deaths_per_accident = Deaths/Accidents) %>%
arrange(desc(Deaths_per_accident))
deadliest_road %>%
head(8) %>%
kable(caption = "Deadliest roads") %>%
kable_styling(bootstrap_options = "striped")
```
### **Accidents by Gender**
```{r accidents-gender-table}
persons <- persons %>%
mutate(sex = recode(sex,
"F" = "Female",
"M" = "Male",
"U" = "Unknown"))
```
```{r accidents-gender-plot}
persons %>%
filter(road_user_type_desc == "Drivers",
sex %in% c("Female",
"Male"),
age > 15) %>%
count(age,
sex,
name = "Accidents") %>%
ggplot(aes(x = age,
y = Accidents,
color = sex)) +
geom_line() +
xlab("Age")
ggplotly()
```
### **Death Rate by User Type**
```{r user-death-rate}
user_type <- persons %>%
count(road_user_type_desc,
name = "User")
user_type_fatal <- persons %>%
filter(inj_level_desc == "Fatality") %>%
count(road_user_type_desc,
name = "Fatal")
user_type %>%
left_join(user_type_fatal) %>%
mutate(death_rate = Fatal / User) %>%
ggplot(aes(x = fct_reorder(road_user_type_desc,
death_rate),
y = death_rate)) +
geom_col(fill = "#999999") +
labs(x = "User Type",
y = "Death Rate") +
theme(axis.text.x = element_text(angle = 45,
hjust = 1))
```
Column {.sidebar data-width=350}
----
***
> **Findings**
1. The accidents are concentrated around Melbourne and branch out as we get further away. Small concentrations are also present around other cities and towns
2. Regional highways are the deadliest types of roads
3. Males commit more accidents than females. Both commit the highest number of accidents as new drivers at a young age, the accidents steadily decrease as the age increases
4. Pedestrians are the biggest casualty of accidents with the highest number of fatalities. Bicyclists are the least at risk of death
***
Conclusions {data-icon="fa-table"}
=====================================
Findings from the data analysis:
1. Accidents increase gradually throughout the working week, and whilst accidents are most common in evening peak hour, it is the least deadly time of day in which to have an accident
2. Higher speeds dramatically increase the risk of dying in an accident, and more recent models of car are far less prone to fatal accidents than older varieties. This is due to improved safety standards in vehicle manufacturing in recent decades
3. Young males are those who are most likely to be involved in an accident, and motorcyclists and pedestrians are most likely to die in an accident. Additionally, regional highways tend to be the deadliest roads in Victoria
References {data-icon="fa-table"}
=====================================
Row
----
### Data
* VicRoads. (2020). *Crash Stats*[Data file]. Retrieved from https://discover.data.vic.gov.au/dataset/crash-stats-data-extract
### Software
* R Core Team (2021). R: A language and environment for statistical computing. R Foundation for Statistical
Computing, Vienna, Austria. URL https://www.R-project.org/.
### R Packages
* C. Sievert. Interactive Web-Based Data Visualization with R, plotly, and shiny. Chapman
and Hall/CRC Florida, 2020.
* D. Kahle and H. Wickham. ggmap: Spatial Visualization with ggplot2. The R Journal, 5(1), 144-161. URL
http://journal.r-project.org/archive/2013-1/kahle-wickham.pdf
* David Robinson, Alex Hayes and Simon Couch (2021). broom: Convert Statistical Objects
into Tidy Tibbles. R package version 0.7.5. https://CRAN.R-project.org/package=broom
* Garrett Grolemund, Hadley Wickham (2011). Dates and Times Made Easy with lubridate. Journal of
Statistical Software, 40(3), 1-25. URL https://www.jstatsoft.org/v40/i03/.
* Jeffrey B. Arnold (2021). ggthemes: Extra Themes, Scales and Geoms for 'ggplot2'. R package version 4.2.4.
https://CRAN.R-project.org/package=ggthemes
* Katherine Goode and Kathleen Rey (2019). ggResidpanel: Panels and Interactive Versions
of Diagnostic Plots using 'ggplot2'. R package version 0.3.0.
https://CRAN.R-project.org/package=ggResidpanel
* Richard Iannone, JJ Allaire and Barbara Borges (2020). flexdashboard: R Markdown Format for Flexible
Dashboards. R package version 0.5.2. https://CRAN.R-project.org/package=flexdashboard
* Sam Firke (2021). janitor: Simple Tools for Examining and Cleaning Dirty Data. R package
version 2.1.0. https://CRAN.R-project.org/package=janitor
* Wickham et al., (2019). Welcome to the tidyverse. Journal of Open Source Software, 4(43), 1686,
https://doi.org/10.21105/joss.01686
* Yihui Xie (2021). knitr: A General-Purpose Package for Dynamic Report Generation in R. R package version
1.32.
Row
----
Thank You and Stay Safe!